SVD
Tool We have a tool for training SVD models (factorize sparse matrices) that takes very general input in the form of CSV files. The documentation needs some work, but you can find it here jssvd. Mathematical Description Prediction Formula : \hat{r}_{ui} = \mu +b_u + b_i + p_u^{T}q_i where : \mu is the overall average rating : b_u is the user bias for user u : b_i is the movie bias for movie i : p_u is the first feature vector corresponding to the u^{th} user : q_i is the second feature vector corresponding to the i^{th} movie Sometimes biases are not used. In this case, the prediction formula is: : \hat{r}_{ui} = p_u^{T}q_i Note that two models are inherently the same, since we can increase the dimensionality of all u and v by 2, and write: : p_{u,n+1} = b_u + \mu : q_{u,n+1} = 1 : p_{u,n+2} = 1 : q_{i,n+2} = b_i However, their training may produce different results. Training Algorithm Objective: : \min_{q_{*}, p_{*}, b_{*}}{\sum_{(u,i) \in K}} ( r_{ui} - \mu - b_u -b_i - p_u^{T}q_i ) ^ 2 + \lambda ( b_u^2 +b_i^2 + ||q_i||^2 + ||p_u ||^2 ) Weight update formula For each pair (u, i) from the training set do: : Calculate residual: d_{ui} = r_{ui} - p_u q_i\! : \Delta q_i = \eta \left(d_{ui} * p_u -\lambda q_i\right)\! : \Delta p_u = \eta \left(d_{ui} * q_i -\lambda p_u\right)\! : q_i \leftarrow q_i + \Delta q_i \! : p_u \leftarrow p_u + \Delta p_u \! We will refer to these as standard formulas. Variants Regression to average : \Delta q_i = \eta \left(d_{ui} p_u -\lambda (q_i-\bar q)\right) : \Delta p_u = \eta \left(d_{ui} q_i -\lambda (p_u-\bar p)\right) where : \bar q = {\sum_{j=1}^{N_\mathrm{items}} q_j \over N_\mathrm{items}} : \bar p = {\sum_{j=1}^{N_\mathrm{users}} p_j \over N_\mathrm{users}} The reasoning behind this is that the prior distribution for p_{jk} is N\left({\bar p}_k, \sigma^2\right) rather than N\left(0, \sigma^2\right) Support-dependent relaxation rate Note that the standard formulas do NOT minimize the function above. Indeed, if \eta is very small, and using the original formula, the total change of p_u at each iteration is: : \left(\Delta p_u\right)^\mathrm{total} = \eta\left(\sum_{i\in I(u)} d_{ui} q_i - \lambda p_u S_u \right) where I(u) is the set of items which appear together with u in the training set, S_u is the support for the user u. Note the S_u factor in the second term. It should not be here, if we indeed moved in the direction of the gradient. This brings the idea to decrease relaxation rate with support. : \Delta q_i = \eta \left(d_{ui} p_u -\lambda_i q_i\right) : \Delta p_u = \eta \left(d_{ui} q_i -\lambda_u p_u\right) where \lambda_i=f(S_i)\! , \lambda_u=g(S_u)\! , f and g are non-increasing functions. For Netflix, the following formula was useful: : \lambda_u(S_u) = \lambda^{(1)} + {\lambda^{(2)}\over \max(S_u, \bar S)} , where : S_u\! is the support for the user u. : \bar S is average support for all users and similar for items. Support-dependent relaxation rate with regression to average This combines both approaches above: : \Delta q_i = \eta \left(d_{ui} p_u -\lambda_i (q_i-\bar q)\right) : \Delta p_u = \eta \left(d_{ui} q_i -\lambda_u (p_u-\bar p)\right) Different learning rates and relaxation rates Relaxation rates for users and items might be different. Learning rates also might be different. When using biases ( \mu , b_i , b_u ), they may have their own learing and relaxation rates. Category:Models Category:Comment methods